expert module
Access Controls Will Solve the Dual-Use Dilemma
AI safety systems face the dual-use dilemma. It is unclear whether to answer dual-use requests, since the same query could be either harmless or harmful depending on who made it and why. To make better decisions, such systems would need to examine requests' real-world context, but currently, they lack access to this information. Instead, they sometimes end up making arbitrary choices that result in refusing legitimate queries and allowing harmful ones, which hurts both utility and safety. To address this, we propose a conceptual framework based on access controls where only verified users can access dual-use outputs. We describe the framework's components, analyse its feasibility, and explain how it addresses both over-refusals and under-refusals. While only a high-level proposal, our work takes the first step toward giving model providers more granular tools for managing dual-use content. Such tools would enable users to access more capabilities without sacrificing safety, and offer regulators new options for targeted policies.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Canada (0.04)
PRISM: Periodic Representation with multIscale and Similarity graph Modelling for enhanced crystal structure property prediction
Solé, Àlex, Mosella-Montoro, Albert, Cardona, Joan, Aravena, Daniel, Gómez-Coca, Silvia, Ruiz, Eliseo, Ruiz-Hidalgo, Javier
Crystal structures are characterised by repeating atomic patterns within unit cells across three-dimensional space, posing unique challenges for graph-based representation learning. Current methods often overlook essential periodic boundary conditions and multiscale interactions inherent to crystalline structures. In this paper, we introduce PRISM, a graph neural network framework that explicitly integrates multiscale representations and periodic feature encoding by employing a set of expert modules, each specialised in encoding distinct structural and chemical aspects of periodic systems. Extensive experiments across crystal structure-based benchmarks demonstrate that PRISM improves state-of-the-art predictive accuracy, significantly enhancing crystal property prediction.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
Li, Cheng, Liu, Jiexiong, Chen, Yixuan, ji, Jie
Transformer models based on the Mixture of Experts (MoE) architecture have made significant progress in long-sequence modeling, but existing models still have shortcomings in computational efficiency and the ability to capture long-range dependencies, especially in terms of the dynamic adaptability of expert resource allocation. In this paper, we propose a Dynamic Adaptive Shared Expert and Grouped Multi-Head Attention Hybrid Model (DASG-MoE) to enhance long-sequence modeling capabilities by integrating three modules. First, we employ the Grouped Multi-Head Attention (GMHA) mechanism to effectively reduce the computational complexity of long sequences. By parallel processing through sequence grouping, local sliding window attention, and feature aggregation, we address long-range dependency issues and the model's lack of generalization for local information. Second, we design a Dual-Scale Shared Expert Structure (DSSE), where shallow experts use lightweight computations to quickly respond to low-dimensional features, while deep experts process high-dimensional complex semantics through pre-training transfer and post-training optimization, achieving a dynamic balance between efficiency and accuracy. Third, we propose a hierarchical Adaptive Dynamic Routing (ADR) mechanism that dynamically selects expert levels based on feature complexity and task requirements, and optimizes resource allocation through a local expert activation strategy. Experiments on multiple long-sequence benchmark datasets demonstrate that our DASG-MoE model outperforms state-of-the-art models.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Middle East > Jordan (0.04)
A Multi-Expert Structural-Semantic Hybrid Framework for Unveiling Historical Patterns in Temporal Knowledge Graphs
Deng, Yimin, Wu, Yuxia, Wang, Yejing, Zhao, Guoshuai, Zhu, Li, Liu, Qidong, Xu, Derong, Fu, Zichuan, Wu, Xian, Zheng, Yefeng, Zhao, Xiangyu, Qian, Xueming
Temporal knowledge graph reasoning aims to predict future events with knowledge of existing facts and plays a key role in various downstream tasks. Previous methods focused on either graph structure learning or semantic reasoning, failing to integrate dual reasoning perspectives to handle different prediction scenarios. Moreover, they lack the capability to capture the inherent differences between historical and non-historical events, which limits their generalization across different temporal contexts. To this end, we propose a Multi-Expert Structural-Semantic Hybrid (MESH) framework that employs three kinds of expert modules to integrate both structural and semantic information, guiding the reasoning process for different events. Extensive experiments on three datasets demonstrate the effectiveness of our approach.
- Europe > France (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > Singapore (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.64)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.63)
Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts
Lee, Rhui Dih, Wynter, Laura, Ganti, Raghu Kiran
We present a toolkit for creating low-cost Mixture-of-Domain-Experts (MOE) from trained models. The toolkit can be used for creating a mixture from models or from adapters. We perform extensive tests and offer guidance on defining the architecture of the resulting MOE using the toolkit. A public repository is available.
- Asia > Singapore (0.04)
- North America > United States (0.04)
- Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Asia > China > Hong Kong (0.04)
Mixture of Experts based Multi-task Supervise Learning from Crowds
Han, Tao, Shi, Huaixuan, Ding, Xinyi, Ma, Xiao, Gu, Huamao, Fang, Yili
Existing truth inference methods in crowdsourcing aim to map redundant labels and items to the ground truth. They treat the ground truth as hidden variables and use statistical or deep learning-based worker behavior models to infer the ground truth. However, worker behavior models that rely on ground truth hidden variables overlook workers' behavior at the item feature level, leading to imprecise characterizations and negatively impacting the quality of truth inference. This paper proposes a new paradigm of multi-task supervised learning from crowds, which eliminates the need for modeling of items's ground truth in worker behavior models. Within this paradigm, we propose a worker behavior model at the item feature level called Mixture of Experts based Multi-task Supervised Learning from Crowds (MMLC). Two truth inference strategies are proposed within MMLC. The first strategy, named MMLC-owf, utilizes clustering methods in the worker spectral space to identify the projection vector of the oracle worker. Subsequently, the labels generated based on this vector are considered as the inferred truth. The second strategy, called MMLC-df, employs the MMLC model to fill the crowdsourced data, which can enhance the effectiveness of existing truth inference methods. Experimental results demonstrate that MMLC-owf outperforms state-of-the-art methods and MMLC-df enhances the quality of existing truth inference methods.
M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework
Zhang, Zijian, Liu, Shuchang, Yu, Jiaao, Cai, Qingpeng, Zhao, Xiangyu, Zhang, Chunxu, Liu, Ziru, Liu, Qidong, Zhao, Hongwei, Hu, Lantao, Jiang, Peng, Gai, Kun
Multi-domain recommendation and multi-task recommendation have demonstrated their effectiveness in leveraging common information from different domains and objectives for comprehensive user modeling. Nonetheless, the practical recommendation usually faces multiple domains and tasks simultaneously, which cannot be well-addressed by current methods. To this end, we introduce M3oE, an adaptive Multi-domain Multi-task Mixture-of-Experts recommendation framework. M3oE integrates multi-domain information, maps knowledge across domains and tasks, and optimizes multiple objectives. We leverage three mixture-of-experts modules to learn common, domain-aspect, and task-aspect user preferences respectively to address the complex dependencies among multiple domains and tasks in a disentangled manner. Additionally, we design a two-level fusion mechanism for precise control over feature extraction and fusion across diverse domains and tasks. The framework's adaptability is further enhanced by applying AutoML technique, which allows dynamic structure optimization. To the best of the authors' knowledge, our M3oE is the first effort to solve multi-domain multi-task recommendation self-adaptively. Extensive experiments on two benchmark datasets against diverse baselines demonstrate M3oE's superior performance. The implementation code is available to ensure reproducibility.
- Research Report (0.64)
- Overview (0.46)
- Information Technology > Knowledge Management (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Gao, Timin, Chen, Peixian, Zhang, Mengdan, Fu, Chaoyou, Shen, Yunhang, Zhang, Yan, Zhang, Shengchuan, Zheng, Xiawu, Sun, Xing, Cao, Liujuan, Ji, Rongrong
With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-level perception tools that fail to provide abstract summaries necessary for comprehensive reasoning. We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks. This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability. To this end, we propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture. Cantor first acts as a decision generator and integrates visual inputs to analyze the image and problem, ensuring a closer alignment with the actual context. Furthermore, Cantor leverages the advanced cognitive functions of MLLMs to perform as multifaceted experts for deriving higher-level information, enhancing the CoT generation process. Our extensive experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance across two complex visual reasoning datasets, without necessitating fine-tuning or ground-truth rationales. Project Page: https://ggg0919.github.io/cantor/ .
- Oceania > New Zealand (0.05)
- Oceania > Australia (0.05)
- South America > Ecuador (0.04)
- (9 more...)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
- Health & Medicine > Therapeutic Area > Neurology (0.34)
- Health & Medicine > Consumer Health (0.34)